Chapter 1: Computers and Programming

 

Introduction

 

This chapter is a high-level explanation of the concept of machines, their digital species computers, and the important parts of a computer itself: logic/instruction processing, input, output, and storage.

 

AuthorΓÇÖs Note: I find that many people have difficultly learning to program, simply because they do not fully grasp the concept of what they are trying to manipulate: computers.

 

Abstract Introduction to Computers

 

Machines are things of fixed and moving parts which consume energy and typically do something useful in return.  The term is so broad that practically everything is a machine.  This book defines a machine as an entity which consumes energy to perform tasks in an exact manner without emotion.  When someone works like a “machine”, they work accurately and without tiring or any concept of time.  A machine might then also be an entity that has no concept of time.

 

Computers are machines that can be programmed through symbolic, usually digital[1], means.  Symbolic programming is a broad definition for some kind of input through human-understood symbols: such as numbers, letters, etc.  If you have a microwave with numerical buttons then it’s a computer because you program it with those numerical buttons to heat something for a given duration of time at a specific level of power.  Some instructions might be pre-recorded, but they are still tripped off by some sort of input.

 

Input is any sort of data used by a computer in order to perform a specific action and produce output.  The output of a computer process can be anything that causes the computer to change itself or spit something out, such as data to be used as input for something else.  The input can be from a user, as in the case of pushing buttons on a microwave, or from some internal source such as the output from a previous operation.

 

Most computers process instructions using two distinct parts, one for “thinking” and one for “memory/remembrance” whereas we only have a single brain.  The thinking part of a computer is what understands instructions and carries out the necessary actions based on them.  It is called the CPU or central processing unit.  The CPU can contain some limited memory of its own, such as what action it just initiated and possibly what the results of it were.  Such memory falls under CPU registers and CPU cache and relates to our immediate impulses and thoughts.  But for the bulk of remembrance a computer utilizes storage.

 

The storage of a computer comes into two main categories: temporary and permanent.  Temporary storage is fast and immediately accessible to the computer’s CPU, but exists for a limited amount of time; usually until the computer is turned off.  Permanent storage is for data that exists for a relatively permanent amount of time until explicitly modified in some way, such as being changed, moved, or destroyed.

 

The concept of storage is closely tied to input and output.  The contents of storage must be viewed by output from the storage and they must be changed by input to that storage.  Something that stores data is a computer in of itself because it has a limited ability to process instructions and it does input and output.  Storage can be readable, writable or both.

 

Readable storage is that which can be viewed.  Writable storage is that which can be changed.  Some storage allows only one or the other.  Think of some typical financial accounts.  A normal savings account will allow you to deposit and withdraw money.  A credit account of some kind, such as a credit card, will allow you only to withdraw or use money.  And lastly a long-term account (such as retirement, 401k, IRA, etc.) will only allow you to deposit.  These are examples of read/write, read-only, and write-only storage respectively.

 

Modern Computers

 

This book focuses on modern or general-purpose computers.  These include personal computers or workstations, minicomputers, mainframes, and even supercomputers.  All of these types of computers have some orthodox properties and functionality: digital instructions, visual output on monitors and/or printers, input through a keyboard and possibly a mouse, temporary storage in RAM and permanent storage on a mass-storage device.

 

The term RAM means random access memory and comes in many flavors of compatibility.  RAM is stored in sticks which are plugged into the inside of a computer.  This RAM is the “mass” temporary memory accessible to a computer.  The data contained in RAM exists so long as the computer is turned on.  The moment the computer is turned off, all data in RAM ceases to exist.  Most personal computers perform a basic test of available RAM when the computer first boots up.

 

A general-purpose computer has the ability to do an almost limitless amount of things and has an infinite possibility for growth and extension.  They are made of two distinct parts: hardware and software.  Each physical piece of a computer is a piece of hardware, also known as a physical device.  Permanent mass-storage devices, known as drives, contain the software instructions to tell the computer what to do.  Software instructions are those instructions that can be altered because they exist on a device which can be changed.  Some hardware devices contain instructions for the computer which cannot be changed, or at least not directly.  These instructions are usually referred to as firmware and usually exist in sections of ROM or read-only memory[2].

 

The whole is more important than the sum of the parts when it comes to computers.  Many hardware devices must connect to other specific types of hardware in order to function properly.  The end result of all these connections is the computer you see and use.

 

The CPU can contact any device properly connected as part of the computer.  However, memory that is directly used is typically temporary.  For example, in order to make a change on a permanent storage device, the CPU first makes the change in temporary memory and then sends that data to the storage device with the request to update.  Almost all changes occur in temporary memory.  This isn’t foreign to how we work.  Most people think of what they’re going to say before saying it, or at least before they write it down.  This process occurs in our immediate thoughts and then the output from those thoughts is what is used.  Temporary memory to a CPU is its immediate thoughts.

 

CPU’s contain temporary memory in the form of cache and registers.  CPU cache contains recently used instructions and other things to speed up processing of a list of instructions.  CPU registers are specific numerical values that are used in logical decisions and calculations.

 

Files and Directories

 

Data in storage devices for computers is typically organized into files and directories.  A directory is a container for files and child directories and a file is a named element of storage that contains data.  Files may contain other attributes as well.  The structure in which files and directories are stored, as well as what file attributes are supported, is specific to the file system.  Think of a file system as a file cabinet, each drawer as a directory, each folder as a child directory, and each file as something in each folder.

 

To access data from a file it is read, which means its data becomes output from the storage device and input to the instructions requesting the file data.  To replace data in a file it is written to.  Data is written to a file by sending instructions and memory data from the CPU and temporary memory (respectively) to the device containing the file.  And to add to data in a file it is appended to.  Appending data to a file is the same as writing, but the difference is that the data is written to the end of the file.

 

The name of a file is known as a filename, which is one word.  There are different qualifications of filenames that can be specified when you are seeking to read or write to a file.  These are fully-qualified filenames, partial filenames, and base filenames.

 

A fully-qualified filename is one that includes the root directory or device (a drive on Windows) as well as all of the directories leading up to the actual file.  For example, a file inside a folder inside a file cabinet would have a fully-qualified name if you said “Look in the file cabinet ‘X’ for the folder ‘Y’ for the file ‘Z’”, e.g. you are specifying the full path to the file.

 

A partial, or relative, filename specifies zero or more of the parent folders from the current folder you are looking in.  For example, if you were looking in the file cabinet already, you could simply look for the folder ‘Y’ with the file ‘Z’.  A partial filename may not contain any directories at all.  In that case it means to look in the current directory.  The operating system is what keeps track of the current directory for each application running; there is more on operating systems and applications below.

 

A base filename is one without any path at all, that is it has no directories specified.  A base filename is a partial filename without any path.  Sometimes the term base is used in conjunction with filenames to mean without path or extension; most systems also allow a filename extension, which is a type name that is appended to the end of the file[3].

 

The length of a filename and its extension is limited to the file system specifications as well as the software that implements it.  For example, the FAT32 file system allows for an infinite amount of child directories, but Windows 95 can only handle filenames of up to 260 characters.

 

Personal Computers

 

Most of you will be using personal computers for your programming so I’ll cover the construct of these in a bit more depth.  That and I haven’t had much experience in other types of computers.

 

Personal computers follow the same construct as any computer, but they are based on micro-processors, a form of CPU.  The type of micro-processor really determines the type of PC you are using and, currently, the software you are using it with.  There are two families: Macintosh and IBM-compatible.  The IBM-compatible market is usually generalized under the term “PC” even though both Macs and IBM-compatibles are both technically PC’s.  So, if you here “Mac” and “PC” as two separate things, this is what they’re distinguishing between.

 

The IBM-compatible category of personal computers are descendants of the original IBM personal computer and somewhat compatible with its architecture.  Thus, the term “IBM-compatible” is the only title relatively useful in grouping this rag-tag group of computers.  Currently there are two giants battling over dominance in the processor (an abbreviation for ‘micro-processor’ in the PC-world) realm of IBM-compatibles: Intel and AMD.  This is another way to determine if your computer is an IBM-compatible: if it contains an Intel or AMD processor.  IBM-compatible micro-processors are also “x86-compatible”.  The term “x86” relates to the CPU architecture and the instructions it understands.

 

Currently only Motorola builds processors for Macintosh computers.  The Macintosh came about after the IBM-compatibles had already caught flame in the early 80’s and is still a minority in the market.  Macintosh computers use a significantly different CPU architecture (among other things).

 

Programming

 

Writing instructions for a computer is known as computer programming.  Programming is the art of forcing something to follow your specific instructions.  You have most likely had some programming experience before, even if not with computers.  Parents program their children to behave a certain way.  You may have had a pet that you trained, by programming it to respond appropriately to your instructions or certain events.  And the elite among us have successfully had a T.V. show taped automatically with our VCR or other video recorder.

 

The basis of programming anything is through instructions.  Machines obey instructions regardless of their good or ill intent, but they must also be very specific and exact in order to be obeyed correctly.  The process of reading instructions and performing actions is known as executing or execution.  A machine will do exactly what it is told unless its hardware is flawed.  Almost all problems with computers occur at the instruction-level, not the hardware level.

A group of instructions for a computer is known as a program.  In order to create programs, instructions must be written and placed onto a device which can be accessed by the CPU.  Typically instructions are placed in files on a mass storage-device such as a hard-drive.  If the instructions exist in writable memory, they are known as a software program or simply software.  The term software is also used to group a set of related programs and data together.

 

An application, also known as a user-application, is a program that has a user interface which is one or more ways of interacting with a user.  Interaction with a user typically involves both input and output, such as a keyboard for input and a monitor for output.  When I use the term application, I assume that it refers to a program that interacts through both input and output.

 

If you have directions to someone’s home then you are following instructions.  Those instructions could be construed as a program.  So, in quick review a program is a group of one or more instructions and software is a group of one or more programs and their data.

 

Operating Systems

 

Most applications are run on a specific operating system.  An operating system is a computer program that runs by default when the computer starts up after being turned on and provides a common user interface to all applications run through it.  When an application is written for a specific operating system, it gets to use whatever functionality that it provides.  The application is also insulated from the hardware because most operating systems block direct access of it; favoring instead the use of their functionality.

 

An application that provides a common user interface on top of the computer’s native operating system is known as a shell.  Some older versions of Windows, for example, were actually more like DOS Shell’s than true operating systems.

 

A program written for an operating system comes in the form of an executable file.  An executable file is one that the operating system knows how to load and execute.  An operating system executes a file by interpreting the instructions within itself or by sending those instructions to the CPU.  The latter is faster, but if the instructions are not in machine language it is not possible for the CPU to understand them.  Script files, such as JavaScript, PERL, and VBScript, are examples of executable files that are run using an interpreter rather than the CPU itself.

 

An interpreter is a program that reads an executable file that it understands and then performs actions based on it.  Some interpreters come masked in the form of virtual machines, which amounts to the same thing.  Advanced interpreters figure out how to make the programs they are running run faster as they are running.

 

BIOS

 

Below the operating system lurks a BIOS, on most computers (personal computers in particular).  The BIOS is tied directly the main hardware of the computer.  It stands for basic input/output system and that’s really all it does.  An operating system or application can use the BIOS for generic input/output interaction with devices or it can access the devices directly.  Accessing devices through the BIOS is much easier, but it is also much slower.

 

The BIOS also decides what instructions to execute (which is what starts the operating system in the first place) and how to configure certain hardware devices as to be made accessible or at least initialized to a blank state.  A BIOS is basically a lower-level operating system that exists as firmware rather than software.

 

Machine Language

 

Instructions that a CPU understands are written in machine language.  That is, the language directly understood by the target CPU.  This language is extremely difficult to understand by most humans.  The reason is that at their base, computers reading digital instructions only understand numbers period.  Imagine trying to decipher this as your grocery list:

 

101000110110010000111010000110000001010101110011100001100

 

In the personal computers arena, this is where things mainly get divided.  Macintosh processors have their own machine language that is significantly different than that of IBM-compatibles.  Since the machine language is different, instructions must be completely re-written in order to be run natively on the opposite architecture.  Imagine writing top-secret instructions in French and giving them to a German (assuming that they know no other languages).

 

As newer CPU’s are released, they also contain newer or completely different instruction sets.  An instruction set is a list of all of the instructions understood by a CPU of a particular machine language.  If instructions of a program are targeted explicitly for a newer or specific brand of CPU, those instructions probably won’t work on an older CPU even though they are written in the same machine language.  This is like telling someone to drive a stick when all they’ve known is an automatic, or play Hockey when all they’ve known is Basketball.  Current CPU’s cannot be updated as instruction sets are updated.  The instruction set of a CPU is firmly mechanical and unchangeable.  Some old dogs learn new tricks, but old CPU’s never do.  Remember that CPU’s only know how to process machine language natively.

 

Logic and Higher Languages

 

Programming a machine directly with machine language is quite improbable.  Imagine having to build a castle from grains of sand.  It’s possible, and it could be more perfect than any other castle but it would take forever (relatively speaking of course).  Such is the case with machine language.

 

Because of this there are other languages that programs can be written in.  These languages each get a level rating depending on how close to machine language they are.  The more insulation you have from the machine you are working with, the higher level language you must be using.  When communicating between each other, we use our spoken language and symbols such as signs.  A sign with a picture is extremely high level because it doesn’t require you to know any language or even how to read at all!  Beyond that a sign with English lettering is mid-level because it is only useful to those who can read English.  And even lower is jargon written in English which is even lower-level because it can only be understood by someone who recognizes the jargon and its meaning.

 

To determine the level of a language you can see where it exists between direct machine language and logic.  The term logic basically refers to the flow of making decisions and delivering output based on some form of input.  If you can learn to put ideas into logical form, you can learn any computer language.  Logic can’t be translated into machine language directly, but you can write in practically any language with the weapon of logic.

 

Some of you may have had some experience in pseudo-code, flow-charting, or UML (unified modeling language) which are ways to express logic.  These are not the only ways to express logic, but they are certainly the most common.  An example of logic would be telling someone to eat a banana only if they had one and it wasn’t bad.  That’s a logical decision.  The result or output is the eating of the banana, the input is the banana, and the conditions for which the result is generated are the presence and ripeness of the banana.  Everything a computer does is logical because it knows no other way.[4]

 

High level logical instructions can be translated into almost any language, computer or human.  Computers are better equipped to deal with some instructions better than others, as is the case with humans.  A computer can devour billions of numerical equations in a few minutes and do something useful with the result whereas it would take a human many years to do the same thing.  But a computer can’t drink a milk shake and benefit from it as easily as a human.  Currently humans are still useful for something it seems. J

 

Instructions written in any computer language are known as source code, or just code, which is stored in source files.  This code is readable by those who understand the specific language, but not the CPU itself.  A CPU only understands machine language.  Code from other languages must be translated into machine language by another program or programs.  The amount of programs it takes, and the complexity of this process, differs from language to language.  It is usually determined by how low or high-level the language is.  A low-level language is easier to translate to machine language because it resembles the machine language more closely.  For example, it’s easier to teach someone how to speak a language that is similar to theirs rather than one that is completely alien, e.g. it’s easier for an Asian to learn new Asian languages than for a European to learn the same language.

 

Assembler

 

Assembler, also known as assembly language, is a category of languages that fall closest to machine language and are therefore low-level.  Assembler code is almost always a direct representation of machine language in human-readable form.  Thus, you work with simple words, mostly abbreviated, that correlate directly to specific numbers in the machine language, which result in specific instructions in the CPU.  Because this is so low-level, it is still very difficult for any human to understand and most of the logic must be created by the programmer.  The reason for this is that logical expressions must be broken down into very tiny parts that the computer can swallow.  The result is very flat and non-logical looking, when in fact it still is logical.  Basically, it’s difficult to see the logic in assembler.

 

Specifications for a machine language will include information about it in assembler terms.  For this reason there is usually only one assembler language per machine language.  The deviation in forms of assembly language come from syntactical elements like the order of the parameters for instructions and the symbols used to reference registers and other things.  There are two forms of assembly language for the x86 (IBM-compatible) machine language: Intel and AT&T.

 

The process of translating assembler into machine language is known as assembling.  This process is very fast because of the direct correlation of the terms in the source to the numerical instructions and data in the corresponding machine language.

 

Compiled Languages

 

Other languages translated into machine language, such as C++, are known as compiled languages[5].  Compilation is different than assembling because it can’t be directly translated, it must be interpreted and then translated.  The software that performs the interpretation step is known as a compiler.  The compiler will generate intermediate files known as object files.  These object files can be either directly translated to machine language or are directly translatable to some other language.  They are not pure because they contain additional data used in linking them.  The translation step is performed by a linker, which links all the resulting object files together and creates a single executable file.

 

The executable file might not be in native machine language.  A compiled language will generally create an executable for the target operating system.  Cross-compilation is the process of compiling for a different operating system or machine language.  Unlike assembler which is closely tied to a specific machine language, compiled languages can usually be compiled into executables that run on different operating systems.  Thus, the concept of source-level portability is created.

 

Source-level portability or compatibility is when you can take source code from a language that will compile for multiple platforms.  A platform is an operating system or machine language (the latter in the case of an absence of an operating system).  Binary-level compatibility is when a compiled executable can be run on different platforms without modification.  This level of compatibility is limited to interpreted executables (Java/C#) across machine languages.  Sometimes different operating systems using the same machine language can share compiled executables, but this is rare and usually only between executables for operating systems from a specific company.  For example, Microsoft makes various operating systems that can usually share the same executables.  Emulators are programs who know a language that can not be run natively on the current operating system, due to executable incompatibilities, language differences, or whatnot.

 

Imagine an executable as a book with interpreted languages as picture-based and executable languages (native machine code) as written text in a specific human language.  You can give a picture book to anyone, but it will typically be thicker and take longer to read because the pictures must be interpreted.  The written text can be read quickly, but only by those who know the language.  An emulator would be someone reading the book to you.  The problem with this is that some things written in a language cannot be translated directly to other languages, and it may take time to say some things over others.

 

Modules

 

Some types of executable files cannot be executed directly.  These files are known as dynamically linkable or shared modules because they must be utilized by another executable that can be run directly.  These files are typically DLLs (dynamic link libraries) on Windows systems and SOs (shared objects) on Unices.

 

Other executable files cannot be used at all unless they are built into another executable.  These are known as static libraries or modules.  Static libraries are compiled code that can be linked into executables to provide additional functionality.  They are static because the linking is done at compile-time rather than run-time.  The term compile-time means “while in the compilation process” (includes linking).  Run-time means “while the executable is executing”.

 

In example, your genes would be static modules because they are stuck into you before you are born.  Jeans on the other hand would be dynamic modules because you can wear whatever ones (or none) you want after you’ve been born.  Like dynamic modules of any kind, you must know how to wear jeans and fit into them for it to work.  Dynamic modules cannot simply be used by any executable.  The user executable must have some knowledge of the dynamic module in order for it to use it.

 

Summary

 

Blarg.

 

For More Information

 

The following websites and books provide in-depth information to gently introduce you to the concept of computers and programming:

 



[1] Almost all hardware is digital these days.  The term digital as it relates to devices means that they read, write, store, and/or process numerical information.  The opposite of digital is analogue which centers on continuously variable, measurable, physical quantities.  The big difference is that digital data has specific limits while analogue does not.  When I speak of computers I am assuming they are based on digital data and instructions.

[2] Almost all ROM can be updated in some way.  The two most common are flashing and physically replacing a ROM chip.  Flashing is a process of replacing an entire block of ROM with new data.  This sounds like a violation of ‘read-only’, but strangely it’s not.  The term ‘read-only’ is fairly relative and you might find that it means different things for different memory in different places. J

[3] The extension is merely a hint at the type of the file and does not guarantee the validity of the file’s contents.  For example, you could write anything on the cover of a book and yet have its contents be completely different.  This would be annoying and improbable, but such is the case with filename extensions as well.

[4] Intuition and faith are at opposite ends of the spectrum from logic.  You perform an action based on intuition when you do not rely on any logical data.  I.e. even though he has a bad track record, you believe he can do what you ask and trust him.  Trust is often based on intuition and faith more than logic.

[5] Some compiled languages are compiled into other languages which are still interpreted though at a much faster rate.  Java is compiled into Java Byte Code, for example, which must be run by a Java Virtual Machine (interpreter).